# Table of Contents

- Introduction
- Clusters Scheme
- RAW Tags
- Processed Tags
- Python Script
    - Requirements
    - Usage
- Dictionary Contents

# Introduction

This readme file explains the current dataset structure and contents. It was written by the [Transparency in Algorithms Group](http://www.rise.org.cy/en-gb/research/research-groups/transparency-in-algorithms-mrg/) at [RISE](http://www.rise.org.cy/), Nicosia, Cyprus.

# Clusters Scheme

After processing all taggers responses, image tagging (APIs), we came up with some super-clusters and sub-clusters as shown below:

- Demographic [super]
    - Masculine
    - Feminine
    - Age
    - Race

- Concrete [super]
    - Actions
    - Body
    - Hair
    - Clothing
    - Colors
    - Meta
    - Shape
    - Location
    - Food
    - Item

- Abstract [super]
    - Judgement
    - Traits
    - Emotion
    - Occupation

- Inflammatory [super]

- Other [super]
    - Ambiguous
    - Lack
    - Misc

# RAW Tags

`6 .xls files with 10 sheets (1 file per ITA, 1 sheet per image,1 sheet for only backgrounds)`

The outputs for the images are separated by ITA (resulting in 6 `.xls` files) and by background (the first 9 sheets in each file). The first column in every sheet corresponds to the image identifier (e.g. AF-248, the identifier used in the CFD for that individual) and each of the following columns corresponds to one raw tag. Each sheet is titled with the context (background) of the images contained, and has 598 rows: the title row (`Target`, `Labels`) and one row each for the 597 people images. The last sheet in every file contains the tags for the backgrounds used for a total of 9 rows: the title row (`Target`, `Labels`) and one row each for the eight background images, where the columns are structured similarly to the previous sheets.

# Processed Tags

`6 .xls files with 10 sheets (1 file per ITA, 1 sheet per image, 1 sheet for only backgrounds)`

The `.xls` files in this output are almost identical to the RAW tags as described above. The only difference is that the tags have been standardized (replaced space & hyphen characters with an underscore and made all characters lower case). Therefore, the number and type of rows, columns, sheets, and files are identical to the RAW tags.

# Python Script

You can download the script from our GitHub repository at https://github.com/oliviaguest/CFD-backgrounds. There is a `create_stimuli.py` script that has been created by Olivia Guest and [Transparency in Algorithms Group](http://www.rise.org.cy/en-gb/research/research-groups/transparency-in-algorithms-mrg/) at [RISE](http://www.rise.org.cy/), Nicosia, Cyprus for research purposes.

This script takes images representing the different backgrounds and normalizes them. The normalization procedure, carried out in function `crop_backgrounds`, involves setting all the images to the same width as the smallest amongst them and then cropping equal amounts from the top and bottom in order to have the exact same dimensions for all the backgrounds images. Once normalization is carried out the foregrounds images are read in and pasted onto the normalized backgrounds. Finally, the code ensures that the ratio of the foreground non-transparent pixels to the rest of the pixels in the merged image takes on a specified value.

## Requirements

The script needs the following libraries installed, in order to have a sucessful execution:
- PIL (Pillow v6.1.0+ from https://pypi.org/project/Pillow/)
- numpy (v1.16.4)
- pandas (v0.24.2)

Requirements can be automatically generated by (pigar)[https://github.com/damnever/pigar].

## Usage

First, fill the `directories.py` script with the paths of the following required directories:

- fg_dir : Where the faces for the foregrounds are saved (eg. Transparent CFD images)
- bg_dir : Where the backgrounds are saved
- cropped_dir : Where the cropped backgrounds will be saved
- stimuli_dir : Where the final stimuli will be saved (foreground on background)


To run with a ratio of, e.g., 0.17 send the `-r` flag followed by `0.17`:
```
python create_stimuli.py -r 0.17
```

If no `-r` flag is given, the script will give a  `0.25` as the default ratio for the execution.

Once the script has run, the stimuli will be in the directory you have chosen along with a CSV file: `stimuli.csv`, which contains useful details per stimulus.

# Dictionary Contents

In the 'DICTIONARY' directory you can find a set of CSV files that are mapping to a corresponding super/sub-cluster and its tags. The cluster name is written in the filename of the files and the tags are representing as their content in one column.

Please note that some of the super-clusters are not representing here, as they can be calculated as the union of their sub-cluster tags. (22 files in total)

- actions.csv
- age.csv
- ambiguous.csv
- body.csv
- clothing.csv
- colors.csv
- emotion.csv
- feminine.csv
- food.csv
- hair.csv
- inflammatory.csv
- item.csv
- judgement.csv
- lack.csv
- location.csv
- masculine.csv
- meta.csv
- misc.csv
- occupation.csv
- race.csv
- shape.csv
- traits.csv
